CONVERGENCE OF MULTIPLE ERGODIC AVERAGES 



BERNARD HOST 

Abstract. These notes are based on a course for a general au- 
dience given at the Centro de Modehamento Matematico of the 
University of Chile, in December 2004. 

We study the mean convergence of multiple ergodic averages, 
that is, averages of a product of functions taken at different times. 
We also describe the relations between this area of ergodic the- 
ory and some classical and some recent results in additive number 
theory. 



In this paper we present some recent theorems of convergence of mul- 
tiple ergodic averages. While the classical Ergodic Theorem describes 
the limit behavior of the time averages of a function, these theorems 
deal with averages of a product of functions taken at different times. 
We essentially focus on the case that the k functions are taken at times 
n, 2n, . . . , kn (Theorem 1 1.2j) but we also consider the case of polyno- 
mial times. 

These convergence results belong to the field initiated by Fursten- 
berg, exploring recurrence properties in ergodic theory and their re- 
lations with combinatorial number theory. The Correspondence Prin- 
ciple (see Section P) provides a bridge between these two domains by 
allowing one to deduce combinatorial properties of sets of integers with 
positive upper density from recurrence theorems. For example, Sze- 
meredi's Theorem on the existence of arithmetic progressions in sets 
of positive upper density corresponds to Furstenberg's Theorem about 
multiple recurrence along arithmetic progressions. The original proof 
of Szemeredi's Theorem is purely combinatorial and some of its gener- 
alizations also have combinatorial proofs, apparently completely differ- 
ent of the ergodic ones. But recent progress in both fields leads to the 
intuition that there exists a hidden relation between the objects and 
methods of the two areas. Understanding this relation more precisely 
could be an interesting challenge. For this reason we do not completely 
ignore the combinatorial point of view in this paper. 

As often happens, the convergence theorems of multiple ergodic av- 
erages will probably receive short self-contained proofs sometime in the 
future. But this is not the case at the present time and it would be 
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completely impossible to give complete proofs within the framework of 
this paper. Our more limited goal is to present most of the necessary 
tools and to summarize the main steps. Some partial results are given 
complete or partial proofs. These are neither the most difficult nor 
the "most important" , but they are chosen for the enlightenment they 
bring to some of the main ingredients and to their uses. 

We hope that the majority of this text is accessible to a reader with 
a minimal knowledge in ergodic theory. For easier reading some notes 
have been postponed to the end of the sections and further comments 
are included in the Appendix. This material is not used in the main 
text and is intended as a supplemental material. The paper uses a lot 
of notation and some of it is not classical. We tried to keep notation 
as similar as possible to the original papers referred to. For an easier 
reading, we introduce the notation only when it is needed. However 
it should be noted that throughout, we use the symbol N with its 
European meaning: N = {0, 1, 2, ... }. 

1. Context and Results 

1.1. Szemeredi's Theorem. Our starting point is a celebrated the- 
orem by Szemeredi. We begin with some definitions. 

Definition. The upper density of a subset E of N is: 



An arithmetic progression of length /c is a set of integers of the form 



where a, d are integers and d > 0. 

Szemeredi's Theorem ( |Sz2j ) . A subset of integers with positive up- 
per density contains arbitrarily long arithmetic progressions. 

The result was first conjectured by Erdos and Turan [ET^ in 1936 
and was solved by Roth jRoj in 1953 for progressions of length 3 and by 
Szemeredi |Szlj in 1969 for progressions of length 4. While Roth's proof 
belongs to harmonic analysis, Szemeredi's method is combinatorial and 
relies on graph theory. 

This theorem can be reformulated in terms of finite sets: 

Theorem 1.1. For every integer k > 2 and every real 6 > there 
exists an integer N{k, S) such that: 

For every N > N{k, S), every subset E of {1,2, ... , N} with Card(i?) > 
5N contains an arithmetic progression of length k. 




...,N-l}). 



{a,a + d, . . . ,a + {k — l)d} 
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Szemeredi's Theorem follows immediately from its finite version and 
the converse implication uses a simple compactness argument. It is 
worth noting that this method cannot provide an explicit value for 
the constants N{k,6). The original proof of Szemeredi did not give 
(usable) constants either and many in the combinatorial community 
competed to find the best constants for progressions of length 3 and 4 
(the winner for length 3 was Bourgain |Bo2j ) until a few years ago when 
Gowers |Glj gave a new proof with explicit constants for the general 
case. He proved the theorem in a more analytical form (see ll.4.1|) . 
already used by Roth and Bourgain: 

Theorem. Let N > 2 be an integer and let Z/A^Z be endowed with its 
normalized Haar measure m. For every integer k > 2 and every real 
6 > there exists a constant c{k, 6) > 0, not depending on N, such 
that: 

For every function f on "E/NZ with < / < 1 and J f{x) dm{x) > S, 



f{x)f{x + y)f{x + 2y)... f{x + {k - l)y) dm{x) dm{y) > c{k, 6) . 



Gowers' proof uses methods of both Fourier Analysis (the circle 
method) and combinatorial number theory, in particular Freiman's re- 
sults f jFrT| . |Fr2j ) on the sums of sets of integers. 

Although it does not seem related to our topic it would be difficult 
not to mention the spectacular new result by Green and Tao which 
answered a very old question. 

Theorem (Green Sz Tao |GrTj ) . The set of primes contains arbitrarily 
long arithmetic progressions. 

(See also Note 11.4.21 ) There exists a relation between the proof of 
Green and Tao and ergodic theory, even though this relation is not 
completely understood at this time (see Appendix IA.2|) . 

1.2. Furstenberg's Theorem and its generalizations. Before stat- 
ing the result we fix some notation and some conventions. 

In general, we write (X, fi) for a probability space, omitting the a- 
algebra; when needed it is denoted by by the corresponding calligraphic 
letter X. We always assume that X is countably generated and that X 
is the Borel cx-algebra whenever X is endowed with a (Polish) topology. 
Throughout these notes, every subset of X is implicitly assumed to be 
measurable and the term "bounded function on X" means bounded 
and measurable. 

By a system, we mean a probability space (X, /i) endowed with an in- 
vertible, bi- measurable, measure preserving transformation T: X X 



(1.1) 
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and we write the system as (X, /i, T). In the main results the hypothesis 
of invertibihty of T can be removed by passing to the natural extension. 

In 1977 Furstenberg proved a very beautiful result about multiple 
recurrence in ergodic theory: 

Furstenberg's Theorem ( |Fu2j : |FuK()j is easier to read). Let {X, /i, T) 

be a system, let A G X be a set with fi{A) > and let k > 1 an integer. 

N-l 

lim inf — V u(A n n t2"a n ■ ■ ■ n t^^^-^^^A) > o . 

n=0 

In particular there exists n>l such that n n T^"^ n ■ ■ • n 
j^Cc-i)"^^ > 0. Then Furstenberg deduced Szemeredi's Theorem by 
using the following Correspondence Principle (with nij = {j — l)n for 
j = l,2,...,k): 

Furstenberg's Correspondence Principle ( |Fu3j ). Let E be a set 

of integers with positive upper density. There exist a system {X, fi, T) 
and a subset A of X with fi{A) = d*{E) such that 

d*{{E + mi) n ■ ■ ■ n + nik)) > fi{A n r"iA n ■ ■ ■ n r™M) 

for all integers k > 1 and all mi, . . . , m^ G N. 

This indirect proof cannot provide explicit bounds in the finite ver- 
sion of Szemeredi's Theorem fTheorem lLlj) . but it is conceptually much 
simpler than the original proof. Moreover the direction initiated by 
Furstenberg's Theorem led to several generalizations, each of them in- 
ducing its combinatorial counterpart by the Correspondence Principle 
(or by some variation of it). For some some of these generalizations, 
there is still no known proof other that the ergodic theoretic proof. 
Below we only discuss two of these generalizations. 

Recently Tao jTj gave a new proof of Szemeredi's Theorem that can 
be viewed as a cross between combinatorial and ergodic methods: the 
proof is purely combinatorial in the sense that it deals only with finite 
sets (subsets of Z/NZ). But the vocabulary and the "philosophy" of 
the paper are much closer to ergodic theory. The proof uses in partic- 
ular an induction that mimics the inductive construction of a sequence 
of extensions used in Furstenberg's paper. It would be interesting to 
compare the way Tao uses Cowers' norms with the way we use the 
ergodic seminorms in pK2j (see Section El and Appendix IA.2|) . 

1.2.1. The Polynomial Szemeredi Theorem. In the next theorem proved 
by Bergelson and Leibman the exponents n, 2n, . . . , appearing in 
Furstenberg's Theorem are replaced by integer polynomials pi(ri), P2{n), 
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. . . (an integer polynomial is a polynomial taking integer values on the 
integers) . 

Polynomial Multiple Recurrence Theorem (Bergelson & Leib- 
man |BL96j ). Let (X, yU,T) be system, let i > 1 be an integer and let 
Pi{n), P2{n), p£^i{n) be integer polynomials with pj{0) = for 
j = 1, 2, — 1. For every A G X with ^{A) > we have 

N-l 

liminf — V u(A n rpi("U n t^^^") n . . . n t^'-^^^'^a) > o . 

n=0 

The Correspondence Principle immediately gives: 

Polynomial Szemeredi's Theorem (jBL9^). Let E be a set oj in- 
tegers with positive upper density and pi{n), p2{n),. . . , pi^iin) integer 
polynomials with Pj(0) = for j = 1,2, 1. Then there exist 

integers a and d > such that 

{a, a + pi{d), a + p2{d), . . . a + p£_i{d)} C E . 

Until now, the ergodic proof is the only known and in particular, 
there is no known combinatorial proof. 

1.2.2. The Multidimensional Szemeredi Theorem. The following theo- 
rem of Furstenberg and Katznelson generalizes Furstenberg's Theorem 
for several commuting transformations. 

Multidimensional Recurrence Theorem (Furstenberg & Katznel- 
FuKlj ). Let k > 1 be an integer andTi,T2, . . . , commuting mea- 



son 

sure preserving transformations of the probability space {X,fi). Then 
for any subset A of X with fi{A) > we have 

N-l 

liminf — V /i(T"A n T^A n ■ ■ ■ n T^A) > . 

n=0 

The original theorem corresponds to the case that Tj = T^~^ for 
j = l,...,k. 

The upper density of a subset of N'^ is defined analogously as for 
a subset of N. The combinatorial counterpart (see Note 11.4.3^ of the 
theorem above is: 

Multidimensional Szemeredi's Theorem f |FuKlj ). Let E C N'' 

be a set of positive upper density. Then for any finite subset F of N'' 
there exists a G N'^ and an integer d > such that a + d.F C E. 

Here, a + d.F = {a+c/.x: x G F}. When F = {0, 1, l}'^ the 

set a + d.F can be called an arithmetic progression of dimension k and 
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length £, hence the name of the theorem. The first combinatorial proof 
of this result was given by Gowers jG2j . 

There exist several other generalizations of Furstenberg's Theorem 
and for each of them a generalization of Szemeredi's Theorem (see for 
example |BMcj ). The deepest result in this class is the Density Hales- 
Jewett Theorem |FuK2j of Furstenberg and Katznelson. 

1.3. Convergence results. It is a natural question to ask whether the 
liminf in Furstenberg's Theorem and in its generalizations are actually 
limits. Our main goal in these lectures is to give an idea of the proof 
of the following results, that we refer to as convergence theorems for 
multiple ergodic averages. The first theorem shows the convergence for 
arithmetic progressions: 

Theorem 1.2 (Host & Kra jHK2j). Let (X,/i,T) he a system, k>l 
he an integer and fi, f2, ■ ■ ■ , fk be bounded functions on X. Then the 
averages 

N-l 

(1.2) ^ Y. hiT^^)f2{T^''x) . . . fkiT'-x) 

n=0 

converge in L'^{fi). 

Taking fi = f2 = --- = fk = ^A and integrating ()1.2|1 over A we get 
that the liminf in Furstenberg's Theorem is a limit. 

The convergence in L'^ifi) of these averages for k = 3 with the added 
hypothesis that the system is totally ergodic was shown by Conze and 
Lesigne in a series of papers ( |CLlj . |CL3j . |CL2j . see also |Le5j ) and 
by Host and Kra |HKlj in the general case (see also |FuWj ). Fursten- 
berg |Fu2j proved the convergence for every k under the assumption 
that the system is weakly mixing. 

A similar convergence result also holds for polynomials: 

Theorem 1.3 (Host & Kra |HK3] : Leibman ^). Let {X,i^,T) be a 
system, k > 1 an integer, pi{n), p2{n),..., Pk{n) integer polynomials 
and fi, f2, . . . , fk hounded functions on X . Then the averages 

1 

(1.3) - J2 /i(r^^^"^a;)/2(TP^(")x) . . . ^(T^'^^^^x) 

n=0 

converge in L'^{fi). 

It follows that the averages appearing in the Polynomial Multiple 
Recurrence Theorem converge. 

The result of Theorem 11.31 was proved by Bergelson |Blj for weakly 
mixing systems. Furstenberg and Weiss |FuWj showed the convergence 
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for two particular cases when k = 2: for pi{n) = n, p2{n) = and for 
Pi{n) = n"^, P2{n) =71^ + 71. The paper [HK6] contains a proof for the 
general case, except when the system is not totally ergodic and at least 
one polynomial is of degree 1 and some other is of degree > 1. This 
restriction was lifted by Leibman |L3j . Frantzikinakis and Kra have 
shown that if the system is totally ergodic and the polynomials pi are 
linearly independent, then the limit of the averages ()1.3p is constant 
and equal to the product of the integrals of the functions 

The paper |HK2j also contains the proof of convergence of another 
type of multiple ergodic averages, the cubic averages: see Appen- 
dix HH 

In the above results the averages on [0, — 1) can be replaced by 
averages on any sequence of intervals whose lengths tend to infinity. 
It is sufficient to prove these two theorems for ergodic systems, as 
the general case follows by ergodic decomposition. So we henceforth 
assume ergodicity. 

The case of several commuting transformations remains essentially 
open. The problem can be formulated as follows: 

Question. Let k > 1 be an integer and Ti, T2, . . . , commuting mea- 
sure preserving transformations of the probability space {X,fi). Is it 
true that for all bounded functions fi, f2, ■ ■ ■ , fk on X the averages 

N-l 
n=0 

converge in L?{^) ? 

The answer was shown to be positive for two transformations by 
Conze and Lesigne |CLlj . Frantzikinakis and Kra |FK2j proved the 
convergence for an arbitrary number of transformations under the ad- 
ditional hypothesis that all the transformations Tj and all the transfor- 
mations TiT~^ ^ 7^ J; are ergodic. These are very strong hypotheses: 
it can be assumed without loss that the transformations are jointly er- 
godic but not that any individual transformation is ergodic. The tools 
developed below for one transformation do not generalize to the case 
of several transformations. 

The convergence almost everywhere of the different averages consid- 
ered here is an open and probably very difficult problem. The unique 
result in this direction is due to Bourgain |Bolj : the convergence a.e. 
of the averages p.2|) for k = 2. 

1.4. Notes on Section [H 
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1.4.1. The analytic form. The finite version of Szemeredi's Tlieorem 
(Tlieorem II. 1|) follows easily from the analytic form, but the converse 
implication is more tricky. The following result is apparently stronger 
than Szemeredi's Theorem but can be deduced from it (this is a non- 
trivial exercise): 

Theorem. For every integer i > 2 and every real S > there exists a 
constant c{i, 6) > such that: 

For every integer N > 1, every subset E of {1,2, ... , N} with more 
than 6N elements contains at least c{i, 6)N'^ — 1 arithmetic progression 
of length £. 

(The —1 in the formula is simply a way to eliminate trivialities for 
small A^.) 

This immediately implies the analytic form in the particular case 
that the function / takes its values in {0, 1}. The general case follows 
by standard methods. 



1.4.2. About the theorem of Green and Tao. The set P of primes sat- 
isfies 




and it is natural to ask wether every subset of N* with a divergent 
series of inverses contains arbitrarily long arithmetic progressions. This 
question was asked by Erdos and Turan |El'j in 1936 and is still open. 



1.4.3. From the Multidimensional Recurrence Theorem to the Multidi- 
mensional Szemeredi Theorem. This implication uses a generalization 
of the Correspondence Principle that we state here. Let Ti, T2, . . . , 
be commuting measure preserving transformations of the probability 
space (X, /i). For n = (ni, n2, . . . , nfc) G Z'^ we write = T^^'T^^ . . . TJ!". 

Proposition. Let E G N'' be a set of positive upper density. There ex- 
ist k measure preserving transformations Ti, T2, . . . , of a probability 
space {X,fi) and a subset A of X with fJ^{A) = d*{E) and 

d*(^f]{E + n)^ ^ /^( n ^"^) ^^^^y fi^^^^ ^^^^^^ F ofn^. 



Then apply the Recurrence Theorem with the commuting measure 
preserving transformations T", n G -F. 
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2. NiLMANIFOLDS AND NILSYSTEMS 

We present here a class of systems for which the convergence resuhs 
(Theorems 11.21 and \l.'S\i can be proven in an easier way. In the rest of 
these lectures we explain without details how the general case can be 
deduced from this particular one. 

2.1. Definitions and fundamental properties. Let G be a group. 

For g,h & G we write [g, h] = g^^h^^gh. The lower central series 

G = Gi 3 G2 3 ■ ■ ■ Z) Gj Z) Gj-^^i 3 . . . 

of G is defined hj Gi = G and, for j > 1, 

Gj+i is the subgroup of G spanned hj {[g, h]: g ^ G, he Gj} . 

Let /c > 1 be an integer. We say that G is k-step nilpotent if Gk+i = 
{!}• 

Let G be a /c-step nilpotent Lie group and let A be a discrete co- 
compact subgroup. The compact manifold X = G/A is called a k-step 
nilmanifold. The fundamental properties of nilmanifolds were estab- 
lished by Malcev jM]. We recall here only the properties we need. 

The group G naturally acts on X by left translation and we write 
{g,x) ^ g ■ X for this action. There exists a unique Borel probability 
measure 11 on X invariant under this action, called the Haar measure 
of X. We make use of the following property which appears in [M] for 
connected groups and is proved in Leibman |L2j in a similar way for 
the general case: 

• For every integer j > 1, the subgroups Gj and AGj are closed 
in G. It follows that the group Aj = A fl Gj is cocompact in Gj . 

Let t be a fixed element of G and let T : X — > X be the transfor- 
mation X ^ t ■ X. Then (X, T) is called a k-step topological nilsystem 
and {X,fi,T) is called a k-step nilsystem. All the notation introduced 
above is used freely throughout the rest of this section. 

Fundamental properties of nilsystems were established by Auslander, 
Green and Hahn |AuGHj and by Parry |Plj . Further ergodic proper- 
ties were proven by Parry |P2j and Lesigne |Le4j when the group G 
is connected, and generalized by Leibman (|L2j. |L3j). We state the 
results we need in the next two theorems. 

Theorem 2.1. The following properties are equivalent: 

• {X, fi, T) is ergodic. 

• {X, T) is uniquely ergodic, meaning that /i is the unique T- 
invariant probability measure on X . 

• (X, T) is minimal, meaning that every orbit under T is dense. 
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• (X, T) is transitive, meaning that there exists at least one dense 
orbit under T. 

The second theorem can be viewed as a particular case of general 
results of Ratner ( |Raj ) and Shah ( Sh ). 

Theorem 2.2. Let x E X and let Y he the closure of the orbit of 
X under T. Then {Y,T) can be given the structure of a topological 
nilsystem, that is Y = H/T where H is a closed Lie subgroup of G 
containing t and T is a closed cocompact subgroup of H. 

By Theorem 12. H (Y, T) is uniquely ergodic. We immediately deduce: 

Corollary 2.3. For every continuous function f on X the averages 



converge for every x G X as N +oo. 

2.2. Two examples. We review the simplest examples of 2-step nil- 
systems. 

2.2.1. Let G = Z X T X T, with multiplication given by 

(A;, X, y) * (k', x', y') = {k + k',x + x', y + y' + 2kx') . 

Then G is a Lie group. Its commutator subgroup is {0} x {0} x T and 
G is 2-step nilpotent. The subgroup A = Z x {0} x {0} is discrete and 
cocompact. Let X denote the nilmanifold G/A and we maintain the 
notation of the preceding section. 

Fix a G T and define t = (1, a, a) G G and let T : X — > X be the 
translation by t. Then (X, /i, T) is a 2-step nilsystem. It can be shown 
that it is ergodic if and only if a is irrational. 

We give an alternate description of this system. The map {k, x, y) i— > 
{x, y) from G to induces a homeomorphism of X onto T^. Identifying 
X with via this homeomorphism, the measure /i becomes equal to 
nif X mj where m-f is the Haar measure of T and the transformation 
T of X is given for {x, ?/) G = X by 



2.2.2. Let G be the Heisenberg group M x M x M, with multiplication 
given by 

(2.2) (x, y, z) * {x\ y', z') = {x + x',y + y', z + z' + xy') . 

Then G is a 2-step nilpotent Lie group. The subgroup A = Z x Z x Z 
is discrete and cocompact. Let X = G/A and let T be the translation 




N-l 



(2.1) 



n=0 



T(x, y) = {x + a, y + 2x + a) . 
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by some t = (ti, t2, ^3) ^ G- We have that {G/A,T) is a nilsystem. It 
can be showed that it is ergodic if and only if ti and ^2 are independent 
over Q. 

2.3. Convergence of multiple ergodic averages for nilsystems. 

Let {X, /i, T) be a nilsystem. We use here the notation of Subsec- 
tion O 

Let A; > 2 be an integer. Then can be given the structure of 
a nilmanifold, quotient of the Lie group by its discrete cocompact 
subgroup A'^. Set s = {t,t^, . . . ,t^) E G'^ and let S be the translation 
by s on . Then (X^, S) is a topological nilsystem. 

Let /i, /2, • • • , /fc be continuous functions on X. By applying The- 
orem 12.21 to the nilsystem {X^, S) and to the continuous function 
(xi,X2, . . . ^ fi{xi)f2{x2) ■ ■ ■ fk{xk) at the point (x, . . . , x) G 
X^ we get that the average 



converges for every x G X. By a standard density argument we have: 

Corollary 2.4. Theorem \l.^ holds for nilsystems. 

An explicit expression of the limit was given by Ziegler |Zlj (a shorter 
proof can be found in [BHK^). 

A similar result holds for the polynomial averages: 

Corollary 2.5 (^L2J). Theorem M.^ holds for nilsystems. 

In place of Corollarv l2.31 the proof uses an extension due to Leibman 
of this result for polynomial sequences in a nilmanifold. 



In this Section and the next ones (X, /i, T) is an ergodic system. 
We introduce a sequence of seminorms of L^^jj) that we use to bound 
the different averages under consideration. These seminorms should 
be compared with the norms introduced by Gowers in a completely 
different context: see Appendix lA. 21 

3.1. Notation. We need some notation to be used throughout the 
remainder of these notes. 

We write C : C ^ C for the conjugacy map z ^ z. Let > 1 be an 
integer. We write e = 6162 . . .£k with Si G {0, 1} for a point of {0, 1}*^, 
without commas or parentheses and \e\ = ei + 62 + ■ — \- Sk. denotes 
the point 00 . . . G {0,1}^ 




N-l 



n=0 



3. Some seminorms 
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For every integer k > we write X^^^ = X'^'" and for k > 1 the 
points of are written x = (x^: £ e {0,1}*^). We write for 
the transformation T x T x ■ ■ ■ x T (2^^ times) of this space. Wc often 
identify X^^+^l with X^^l x X^^^ writing x = (x', x") for a point of Xl'^+^l 
where x',x" G X^^^ are defined by 

for every e e {0, l}*^, x'^ — x^o and x" — x^i ■ 

3.2. Construction of some measures and some seminorms. We 

define by induction a Tl'^l-invariant measure /i''^' on 

X\k] for every m- 

teger A; > 0. Set /x'"! = ji. Assume that jj}-^'^ is defined for some A; > 0. 
Let denote the Tl^^l invariant (j-algebra of {X^^\ i^^\T^^^) . Identi- 
fying Xl'^+^l with Xl*^! X X^^^ as explained above, we define the system 
/it'^"'"^], T'^^+^I) to be the relatively independent joining of two 
copies of {X^^\ ^^^\T^^^) over 

This means that /^^^"'"^l is the measure on 

X[k+^ = xl^l X Xt^l char- 
acterized by: 

when F' and F" are bounded functions on X^''\ 
(3.1) 

/ F'(x')F"(x") d/i[*^+i](x) = [ E{F' I Xl'^]) E(F" | I^''^) dy^''^ . 

This measure is invariant under T^^^^^ = T^^'^ x T^^' and each of its 
two natural projections on X^^^ is equal to jj^^y Note that when F is 
a function on X^^\ measurable with respect to X^^^ that is invariant 
under T^^\ we have 

(3.2) F(x') = F(x") for //I'^+^l-almost every x = (x', x") e Xl'^+^l . 

By induction, each of the 2^ natural projections of jj)^^^ on X is equal 
to |U. It follows immediately from this definition that for every bounded 
function f on X the integral 

/ n c'i^i/(x,)d/.['=](x) 

is real and nonnegative and we can define 

Of \ 1/2* 

As X is assumed to be ergodic, the cr-algebra X^"! is trivial and 
^[1] = |x X /i. We therefore have 

= (y f{xo)f{xi)dijLXijL{xo,xi)^ ^ = y J{x)dn{x) 
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From the inductive definition (jH.lj) of tlie measures we get tliat 
il/iU+i > III /life for every / and every k. 

Tlie following Lemma follows immediately from the definition of the 
measures and the Ergodic Theorem. 

Lemma 3.1. For every integer k > and every bounded function f 
on X , 

(3.4) lll/lb+i = ( lim 

n=0 

3.3. The Kronecker factor and the measure /x'^l. 

3.3.1. The notion of a factor. As usual in ergodic theory, we use the 
term factor with two different but equivalent meanings. First, a factor 
of the system {X,fi,T) is a T-invariant sub-a-algebra 3^ of X. 

On the other hand, if (Y, u, S) is a system and it: X — > F is a 
measurable map mapping /i to z/ and such that Son = vr o T (/i- 
a.e.), then vr is called a factor map and Y is also called a factor of 
X. In this situation we always identify the a-algebra y of Y with its 
inverse image 7r~^(3^), which is an invariant sub-cx-algebra of X, that 
is a factor of X under the first definition. It is thus natural to denote 
the transformation on Y by the same letter as the transformation on 
X, meaning by T in our case. 

It can be shown that every invariant sub-cx-algebra of X can be 
associated to a factor map in this way and thus the two definitions 
are functionally equivalent. We pass freely from one definition to the 
other. 

Let / be an integrable function on X. We consider E(/ | 3^) as a 
function defined on X and we write E(/ | Y) for the function on Y 
defined by E(/ \Y)o7t = E(/ | 3^). It is characterized by: 

ygeL^iu), [ Eif\Y){y).giy)du{y)= [ f{x).ginix))dfi{x). 
Jy J X 

3.3.2. The Kronecker factor. The Kronecker factor of the system X is 
written Zi{X) or Zi when the system under consideration is clear from 
the context. We recall here the definition and some classical properties. 

Viewed as a a-algebra, the Kronecker factor Z\ of X is defined to be 
the sub-(T-algebra of X generated by the eigenfunctions of this system; 
is is also the smallest sub-a-algebra of X such that all the invariant 
functions of the system (X x X, ^ x fi,T x T) are measurable with 
respect to Zi. 
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When considered as a system, the Kronecker factor {Zi,m, T) is a 
rotation; this means that Zi is a compact abehan group with Haar 
measure m and that the transformation T has the form z ^ z + a 
where a is a fixed element of Zi. 

We write tti: X Zi for the factor map. Then every eigenfunction 
of X has the form f{x) = cxi'n'iix)) where c is a constant and x is 
a character of Zi, that is a continuous group homomorphism from Zi 
to the circle group S^. Every T x T-invariant function on X x X can 
be written f{xo,Xi) = g{T:{xi) — ^{xo)) where g is a function on Zi. 
Therefore, when /o and /i are bounded functions on X, 

E(/o0/i 

E(/o I Zi)(z)E(/i I Zi){z + 7r{xi)-7r{xo))dm{z) . 



Izi 



3.3.3. The measure yu'^l. Wc deduce a more explicit expression for the 
measure /^[^l: when f^, s e {0,1}^ are four measurable functions on X, 

(3.5) / /oo(a;oo)/oi(a;oi)/io(a;io)/ii(a;ii) (i/i'^^(x) 

foo{z) foi{z+s) fw{z+t) fu{z+s+t) dm{z) dm{s) dm{t) 

ZixZixZi 

where we write for E(/j | Zi). This implies that |||/|||2 is the £'^-norm 
of the Fourier Transform of E(/ | Zi). In particular: 



Lemma 3.2. For every bounded function f on X, |||/|||2 — if and 
onlyifnf\Zi)=^. 

We give one more formula for the measure /xl^l. For every s G Zi we 
define a probability measure /x^ on X x X by 
(3.6) 

/ fo{xo)fi{xi)diXs{xo,xi) = / E(/o I Zi){z)E{fi \ Zi){z+s) dm{z) 

JxxX JZi 

This measure is invariant under T xT and we have 

(3.7) iJ, X ij, — lis dm{s) . 

Jzi 

From the remarks above it follows that this formula is the ergodic 
decomposition of /x x under T x T; in particular the system (X x 
X, iis,T X T) is ergodic for m-almost every s e Zi. We have: 

JZi 
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(notice that /i^ x /x^ is a measure on {X x X) x {X x X) = X^^l). 



3.4. Seminorms and arithmetic progressions. Later we prove that 
for every A; > the map / i— > |||/|||a; is a seminorm on L°°(/x). Admit- 
ting this for the moment we show now how these seminorms arise in 
the questions of convergence we are studying. 

Proposition 3.3. Let /i,/2,...,/fc be bounded functions on X with 

\\fi\U<l fori =1,2,..., k. Then 

(3.8) 

II 1 

hmsup -5^/i(T"a;)/2(T2"x).../,(T^'"a;) < mm {i.jfeU) . 



n=0 



The proof rehes on an iterated use of a Hilbert space variant of the 
van der Corput Lemma. 



Van der Corput Lemma f |Blj ). Let be a sequence in a Hilbert 
space H, with \\C,n\\ < 1 for every n. Then 



N~l „ , H-1 , N-1 



(3.9) limsup — < hmsup — Vhmsup — y'(^„ I 

Af^+oo II iV II H-^+oo -n AT-^+oo liV 

n=0 h=0 n=0 

Proof of Proposition \3.!A We proceed by induction. For k = 1 the 
bound is given by the Ergodic Theorem and the definition of ||| ■ |||i. Let 
k > 2 and assume that the bound holds for A; — 1. Let /i, . . . , be as 
in the theorem, and choose £ G {2, . . . ,k} (the case £ = 1 is similar). 
Write 

Ux)=fiin)f2{T''^x)...fk{T'^x). 

For every /i > 0, by using the Cauchy-Schwarz Inequality and the 
invariance of /x under we get 

^ N-1 N-1 k 

n=0 n=0 2=2 

By the inductive assumption, 
I — 

N-*+oo 



I 1 

limsup — I ^n+h) 



N 

n=0 



<i.l\fe.feoT'%^^ 
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By the van der Corput Lemma, 

N-l 2 1 -f^-l 

1 1 /V E ^ ^- 1™ H 5Z III f^-^e ° I 

n=0 ^ H~*+oo 



fc-1 



<£2.1imsup^5^|||/,./,oT'^|||,_i 



h=0 



1 t i\ 1/2" 



<^Mimsup(-^|||/,./,oT' 



and this last term is equal to ^^•11/^11^ by Lemma [3.11 □ 

3.5. Seminorms and polynomial averages. A similar bound holds 
for the averages p.3|) considered in Theorem 11.31 

Proposition 3.4. Let k > 1 be an integer and pi,p2, ■ ■ ■ ,Pk be integer 
nonconstant polynomials such that for every 1 < i j < k the polyno- 
mial Pi — pj is not constant. There exists an integer i > 1 such that 
for any bounded functions /i, /2, • • • , /fc on X , 

- /i(TP^("^a;)/2(T^'^(")x) . . . /fc(T^'"-(")x) 

n=0 

converges to zero in L'^^jj) whenever \fi\i = for at least one value of 
zG{l,2,...,fc}. 

The proof uses induction on the polynomial family {pi,p2, ■ . ■ ,Pk), by 
using the van der Corput Lemma and the Cauchy-Schwarz Inequality 
at each step. However this induction is much more intricate than for 
arithmetic progressions and we do not present it here. 

3.6. Invariance properties. Let {0, l}'^ be identified with the set of 
vertices of the unit cube [0, l]'^. 

We call the group of the k-cube the group of isometries of the Eu- 
clidean cube [0, l]'^. We consider this group as acting on in the 
following way. Each element a of the group induces a permutation, 
written a also, of the set {0, l}'^ of vertices and this permutation in 
turn induces a transformation a* of X^^' by: 

for every e G {0, 1}'', ((T^^x)^ = x^(e) ■ 

Lemma 3.5. For every integer k the measure /i^'^' is invariant under 
the action of the group of the k-cube. 

An inequality similar to the classical Cauchy-Schwarz inequality can 
be proven inductively by using this symmetry property: 
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Lemma 3.6. If fe, e G {0, l}'^, are 2^ bounded functions on X then 

(3.10) |/" n feixs) dfi^'\^)\ < n iiiMiu- 

ee{0,l}'' ££{0,1}'= 

It follows that the map / \lfl\k is subadditive: 

Corollary. For every k > 1, ||| ■ |||fc is a seminorm on L°°{fi). 

We use the geometric vocabulary for subsets of {0, l}'^; for example, 
a side is a subset of {0, 1}*' of the form {e: Ei = 0} or {e: Ei = 1} for 
some i G {1, . . . ,k}. When a is a side of {0, 1}^ we define the side 
transformation Tp' of X^^^ by: 

Txe if e G a; 
Xp otherwise. 



(3.11) (Tl'^lx)^ 



Notice that the product of two side transformations corresponding to 
opposite sides is equal to Tt'^l 

When a is the side {0,1}^=-^ x {1}, xi^^ is equal to Id^''"^] xTl^^-^l 
under the identification of X^''^ with X^^^^^ x X'^^^l By definition of 
the measure /i^^', it is invariant under this transformation. As the group 
of the fc-cubes leaves the measure /i^'^l invariant and acts transitively 
on the sides, we get: 

Lemma 3.7. For every integer k the measure /i^'^' is invariant under 
all side transformations. 

By induction, it can be checked that: 

Lemma 3.8. The measure /i^'^' is ergodic under the joint action of the 
side transformations. 

3.7. Notes on Section The measures /i^'^l can be described ex- 
phcitly in the case that the given ergodic system (X, /i, T) is a k-step 
nilsystem. We put X = G/A and use the notation of Section El 

Let > 1 be an integer. For every g E G and every side a of {0, 1}*^ 

we define an element g^a"^ of G^^^ by 

if e G a ; 
otherwise. 



We define the side group of dimension k to be the subgroup G^l^ of 
spanned by all these elements. It can be checked that Gj^l^ is a 
closed Lie subgroup of G^^l and that the subgroup Afli of Al'^l defined 
in the same way is equal to A'^^l fl G^l^ and is discrete and cocompact 
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in Glflp The nilmanifold Xk = G^^}_-^/ A^^^ is naturally imbedded in 
Xl'^l and the measure fi^''^ is the Haar measure of this nilmanifold. 

4. Building factors 

In this Section we continue to assume that {X, /i, T) is an ergodic 
system. Lemma [3.21 gives a simple relation between the seminorm ||| ■ |||2 
and the Kronecker factor Zi introduced in Subsection 13.3.21 For every 
k > 2 we define here a factor Zk of X with a similar relation to the 
seminorm ||| ■ |||fe+i. 

4.1. The definition of the factors Z^- Let A; > 1 be an integer. In 
the following construction the coordinate of Xt'^l indexed by plays a 
particular role. Due to the symmetry of the measure yu'^^^, any other 
choice is possible with the same results, up to obvious changes in no- 
tation. 

We note X^^l* = ~^ and write each point x G X^'^l as x = (a;o, x), 
where 

i=(x,:£G{0,lf \{0})GXW* 

and thus we identify X^^l with X x X^^l*. 

Recall that the projection of /it*^! on X is equal to fi. We write /i^'^l* for 
the projection of /i'^^^ on Xt'^l*. This measure is invariant under Tt'^l* = 
TxTx---xT {2^-1 times). We say that the system (X^, /x^, T^) 
is a joining of the systems (X, /x, T) and (X^^l , yU^^l , T^^l ). 

Let tI''\tI''\ . . . , T^^' be the side transformations of X'^^ correspond- 
ing to the k sides of {0,1}*^ not containing = (0,0, ...,0): For 
1 < z < A;, X G XW and e G {0, l}^ 

(Tf X), = 

\k] 

For 1 < i < k the transformation T/ leaves the coordinate indexed 

Ik]* 

by invariant and thus it can be written Id x T- for some measure 
preserving transformation tJ''^ of X'^^*. Let J'^^^* be the cr-algebra of 
subsets of Xt'^l* which are invariant under the transformations tJ'^' , 
1 < i < k. An induction using relation ()3.2|1 gives: 

Lemma 4.1. Let B be a subset o/Xl^l*. Then B belongs to J^'^^* if 
and only if there exists a subset A of X with 

(4.1) 1^(3:0) = 1b(x) for 11^^"^ -almost every x G X^'"'. 

This relation between B and A defines a bijection (up to null sets) 
between the cr-algebra jT't^'* and some cr-algebra of X. We define: 
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Definition. Zk-i{X) is the sigma-algebra of subsets A oi X such that 
equahty ()4.1|) holds for some subset B of X^''^* . 

Z\{X) was aheady defined to be the Kronecker factor of X. Below 
(Corollary 14 .31) we show that the two definitions coincide. We write 
instead of Z]^_x{X) whenever it is possible without ambiguity. 
This (j-algebra is clearly invariant under T and so it is a factor map. 
Let the associated factor system be denoted by (Zfc_i(X), T) or 
by /Xfc_i, T) and let T^h-\ '■ X — > Z^.i be the factor map. 

4.2. Elementary properties. 

Proposition 4.2. 

i) Consider the a-algebra Z^-i on X and the a-algebra J^^'^* on 

are identified by relation ()4.H1 . Then (X'^l, /i^'^l) is the 
relatively independent joining of {X,n) and (X^^l*, yU^^l*) over 
their common a-algebra Zk-i = J^^^* . 

ii) For a bounded function f on X , \ f\k = if o-nd only if^{f \ 
Zk-i) = 0. 

iii) The measure fj}'^^ is relatively independent over its projection 
on zf^\; this means that when fe, e G {0,1}'^, are bounded 
functions on X then 



(4.2) 




Moreover, Zj^^i is the smallest factor of X with this property. 
iv) Every T^''^^^ -invariant subset of X^''~^^ is measurable with re- 
spect to zj^Si^ . Moreover Z^^^^^ is the smallest factor of X 
with this property. 

Proof 

meaning of this statement is perhaps not obvious and we begin 
with some explanation. We have already introduced the factor map 
7ik-i '■ X — >■ Z^-i- As we identify Z^-i with the a-algebra JT'''^^* on Xt'^l* 
we have also a factor map pk-i ■ X^''^* — >• Zk-i with JT'''^^* = p'f^\{Zk-i). 
Relation (j4.H) can be written 

(4.3) 7rfc_i(a;o) = Pfc-i(x) for yut'^'-almost every x = (xcx) . 

When F is an integrable function on X^''^* , following our standard 
convention we write E(F | Zk_i) for the function on Zk_i given by 
E(F I Zk-i) o Pk-i = E(F I J'^''^*). Statement pj] means that for every 
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bounded function / on X and every bounded function F on X^^l , 

(4.4) j f{xo) t//iW(x) = jnf\ Zk-i) E(F | Z^-i) dfi^-i • 

This relation is similar to formula (jH.lll used to define the measure 
^[k+i] in Section O 

Let / and F be as above. For i = 1,2, ... ,k the measure //''^l and 
the function x i— /(xq) on X^^l are invariant under the transformation 

t/'^' = Id X t/'^' . Therefore the first integral in ()4.4|1 remains unchanged 
when the function F is replaced by its conditional expectation with 
respect to JT'^'^'* and this integral is equal to 

/(a;o)E(F|Z,_i)op,_i(i)d/iW(x). 

By using ()4.3p we rewrite this integral as 

f{xo) E{F I Zk-i) o 7rfc_i(a;o) dj^{xo) • 



In this last integral we can replace the function / by its conditional 
expectation with respect to Zk-i and equality ()4.4|1 follows. 

Assume that E(/ | Zk-i) = 0. Using (jO|) with F(x) equal to 



the product of the functions Cl^l/(xe) for e G {0, l}'^ \ {0}, we get that 

lll/IIU = 0. 

Assume conversely that |||/|||fc = 0. By using Lemma lTHl and a density 
argument, we get that the first integral in ()4.4j) is equal to zero for 
any choice of the function F on Xt'^l* and thus in particular when 
F = E(/ I Zk_i) opf^_i. In this case we have E(F | Z^^i) = E(/ | Z^^i) 
and the second integral in (j4.4p is equal to /E(/ | Z/.^i)'^ dfi^^i. As it 
is also equal to zero we have E(/ | Z^-i) = and thus E(/ | -Zfc-i) = 0. 

|in)| B y again using the equality ()4.4j) we get that the first integral 
in ()4.2|1 remains unchanged when the function E(/o | Zk-i) is sub- 
stituted for the function /q. The same properties holds for the other 
vertices e G {0, l}'^ because of the symmetry of the measure /il'^' (see 
Lemma f3.5p and this gives ()4.2j) . 

Let 3^ be a factor of X with the same property. For each bounded 
function on X with E(/ | 3^) = 0, equality ()4.2|1 with all functions 



equal to / gives |||/|||fc = 0, and thus E(/ | Zk_i) = by ii) This shows 
that y D Zk_i and achieves the proof of |iii)[ 

|iv)| Let y4 be a T''^"-'^! -invariant subset of X^'^^^\ By construction 
of the measure //I'^l we have lyi(x') = 1a{^") for /i^^l almost every 
X = (x',x") G and thus //^(A x A) = /x['=-il(A). On the other 
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hand, 

xA) = j 1a® lAci/it'^l = j E(l^ ® 1a I 4-i) ^A^^''' 

where the second equahty holds because the measure jj}-^'^ is relatively 
independent over 4-i- Moreover, as the cx-algebra 4-^^' ^^"^ ^'^^ 
set A are invariant under T''^"-'^!, the function E(lyi | 4-i^^) 
variant under this transformation and by construction of the measure 
/iW we have E(1a | zf:^^){ii!) = E(1a | for /^W almost 

every x = (x',x") G Xt'^l The last integral is therefore equal to 
/ E(1a I 4-/')^ We get that the function 1a and its con- 

ditional expectation on Z^^Zi^ have the same norm in L'^{n) and it 

follows that they are equal and that A is measurable with respect to 

^[k-i] 

^k-i ■ 

The announced minimality property of Zk^i can be proven by a 



method similar to the one used in the proof of iii) □ 



Corollary 4.3. Zq is the trivial factor of X and Zi is its Kronecker 
factor. The sequence of factors {Zj^: k > 0} is increasing. 

We therefore have a chain of factor maps: 

(4.5) X — > ■ ■ ■ ^ Zk+i — > ^ Zk-i Zi ^ Zq . 



Proof. All these properties follow from part [n)] of the Proposition by 
using the formula |||/|||i = | / f dfi\ for the first one. Lemma f3. 21 for the 
second one and the ordering of the seminorms (Subsection K12j) for the 
last one. □ 



4.3. Systems of order k. 

Definition. Let A; > be an integer. A system of order k is an ergodic 
system X with Zk{X) = X. 

It is easy to check that for any ergodic system X we have Zk{Zk{X)) = 
Zk{X) and thus that Zjs{X) is a system of order k. By Corollarv 14.31 
there exists a unique system of order zero, the trivial system. The 
systems of order 1 are those which are equal to their Kronecker factor, 
that is the ergodic rotations (see Subsection 13.3.2^ . Every system of 
order k is also a system of order i for every i > k. 
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4.3.1. Reduction to systems of order k. We explain here how it suffices 
to prove the convergence theorems for systems of order k (for some k). 
Let (X, /i, T) be an ergodic system, /i, /2, . . . , be bounded functions 
on X and consider the averages 

O ^ E /i(^"^)/2(T'"a;) . . . MT'-x) 

n=0 



as in Theorem 11.21 Fix i G {1,2,. By part ii) of Proposition 14.21 
\lfi — E(/j I = and thus by Proposition 13.31 the difference 

between the averages ()1.2p and the same averages with E(/j | Zk-i) 
substituted for fi converge to zero in L'^{fi). In order to prove The- 
orem II. 2t we can thus assume without loss that all the functions 
are measurable with respect to Therefore we can assume that 

the functions are defined on the associated factor system Zk_i{X): we 
say that this factor is a characteristic factor for the convergence of 
the averages ()1.2p . As Zk-i{X) is a system of order /c — 1, it is suffi- 
cient to prove the convergence of these averages under the additional 
hypothesis that X is a system of order k — 1. 

The same method applies to the polynomial averages of Theorem ll.3l 
for every polynomial family pi,p2, . . . ,Pk there exists by Proposition l3.4l 
an integer i such that it suffices to show the convergence under the 
additional hypothesis that the system is of order £. 

4.3.2. Reduction to nilsystems. In the rest of the paper we establish 
a relation between the systems of order k and the A;-step nilsystem 
described in Section |21 We need a definition. 

Definition. For each integer i > 1 let (Xj,/ij,T) be a factor of the 
system (X, /i, T) and assume that this sequence is increasing, meaning 
that the sequence {Xi} of associated sub-cr-algebras of X is increasing. 
We say that X is the inverse limit (or the projective limit) of the 
sequence {Xj} if X = \/-Xi, that is, if A" = Xi up to null sets. 

Structure Theorem f jHK2j . Theorem 10.1). For every k > 1, every 
system of order k is an inverse limit of a sequence ofk-step nilsystems. 

The convergence results (Theorem 11.21 and II. 3|) follow easily since 
they hold for nilsystems (Corollaries 12.41 and 12. 5p and pass to inverse 
limits. 

The proof of the Structure Theorem is the longest and the most 
technical part of the proofs of convergence and in the next section we 
can only give a relatively vague idea of the strategy . To understand 
why this proof is long it is perhaps interesting to compare the two 
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notions involved in the theorem. Systems of order k are defined in 
terms of abstract ergodic theory, without any mention of a topology 
or a differentiable structure on the space. The unique ingredients of 
this poor structure are a probability space and a measure preserving 
transformation. On the other hand, nilsystems are defined in terms of 
Lie Groups and clearly have a rich structure. Proving the Structure 
Theorem therefore forces us to build this rich structure from scratch. 

4.4. Notes on Section HI 

4.4.1. Characteristic factors. The notion of a characteristic factor was 
already used (without the name) in Furstenberg's original paper |Fu2j 



and applies to a wide range of questions. Assume for example that we 
are dealing with the limit behavior of a sequence of averages depending 
on some functions. We say that a factor F of X is characteristic if 
the difference between the given averages and the averages with each 
function replaced by its conditional expectation on y converges to zero 
(for the notion of convergence in question). We are therefore left with 
studying the limit behavior with y substituted for the given system 
and this can be much easier if y has a "rich" structure. 

Clearly characteristic factors are not unique: a factor containing 
a characteristic one is characteristic too. It can be proven that the 
factors Zk defined here are the smallest possible for our convergence 
problems. Furstenberg used the much larger maa;2ma/ d2sta/ /actor. The 
"structure" of this factor is much weaker than that of a nilmanifold but 
is sufficiently rich to make the proof of Furstenberg's Theorem possible. 

4.4.2. The case of a nilsystem. The factors Zk{X) can be described 
explicitly for nilsystems. 

Let (X, /i, T) be an ergodic ^-step nilsystem. We use the notation 
of Section |21 and assume moreover that the group G is spanned by its 
connected component of the identity and the element t defining the 
transformation T (it is always possible to reduce to this case). For 
every k > 1, {AGk+i)/Gk+i is a discrete and cocompact subgroup of 
the nilpotent Lie group G/Gk+i, and Zk{X) is the nilsystem 

Zj^[X) = ^/ Gk+i ^ G 

{■^G k+i) / G k+i Gfe+iA 

endowed with translation by the projection of t on G/Gk+i- This re- 
sult was already proved by Parry f |Plj ) and Leibman ( jL2j ) for the 
Kronecker factor corresponding to the case k = 1. Using this formula 
with k = i we get that every ergodic k-step nilsystem is a system of 
order k. 
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5. On the way to the Structure Theorem 

5.1. A group associated to an ergodic system. To every ergodic 
system (X, /i,T) we associate a group G{X) of measure preserving 
transformations. The strategy consists in showing that for sufficiently 
many systems of order k this group is a nilpotent Lie group and acts 
"transitively", so that the system can be given the structure of a nil- 
system. We need some notation. 

Let ghe a. measure preserving transformation on X written x ^ g-x 
and let > 1 be an integer. For each side a of {0, l}'^ we define the 
transformation ga^ of Xt'^l by 

foW-x) -/^■"'^ ifeea; 

I Xs otherwise. 

This definition coincides with that of the side transformation in the 
case that g = T. 

Definition. Q{X) is the group of measure preserving transformations 
of X such that for every integer k > 1 and every side a of {0, 1}*^, the 
transformation ga^ leaves the measure /i^'^' invariant. 

The proofs of the following results can be found in Section 5 of |HK2j . 
We write Q instead of Q{X) except when some ambiguity can occur. 
This group is a Polish group when endowed with the topology of con- 
vergence in probability. It contains T and every measure preserving 
transformation of X commuting with T. 

Lemma 5.1. Let g G Q{X). Then for every k the transformation g 
of X maps the factor to itself and thus induces a transformation of 
Zk, which belongs to G{Zk). 

Lemma 5.2. IfX is a system of order k thenQ{X) is a k-step nilpotent 
group. 

Proof Let gi, 5^2, • • • , Qk+i e G and h = [[. . . [fi-i , 5-2] , S-s] , • • • , 9k+i]- 

Let ai,a2, . . . a^+i be the k + 1 sides of {0, l}'^ containing 0. The 
measure /x'*^"'"^' is invariant under each transformation ga,^^^ and thus 
also under their commutator [[. . . [ga^^\ ga^^\ ga:^^\ ■ ■ ■ , ga^+i]- But 
is is easy to check that this transformation is equal to the transfor- 
mation hQ~^^\ given by {h^^^^ ■ x)o = h ■ xq and {hQ~^^^ ■ x)^ = for 

Let y4 be a subset of X. As X is of level k, Zk{X) = X and by 
definition there exists a subset B of A'^^+^l* with l^(a;o) = for 
^[fc+i] almost every x. Applying the transformation h^^^^ we get that 
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lA{h ■ Xq) = 1_b(x) (/i^'^^^^ a.e.) and thus that 1a{xo) = ^A{h ■ Xq) 
(yU-a.e.). This shows that every subset A of X is invariant under h and 
that h = ld. □ 

5.2. Relations between two consecutive factors. 

Definition. Let {Y, u, T) be a system, U a compact abehan group 
endowed with its Haar measure mu and p: Y ^ U a measurable map. 
Let X = Y X U he endowed with the measure /i = z/ x mu and with 
the transformation T given by T[y,u) = {Ty,u + p{y))- Then we say 
that X is an extension ofY by U and p is called the cocycle defining 
the extension. 

We note that y is a factor of X, with the factor map given by 
(y, u) (-^ y. For each v E U, the transformation Ru : {y, u) t-^ {y,u + v) 
of X preserves p and commutes with T; it is called a vertical rotation. 

Proposition 5.3. Let {X,p,T) be a system of level k and (F, z/, T) = 
Zk-i{X). Then X is an extension ofY by a compact abelian group U. 
Moreover, the cocycle p: Y U defining this extension satisfies: 
There exists a map F: ^ U with 

(5.1) J2 i-^y'^piye) = FiT^'^y)-F{y) 

ee{o,i}''- 

for u^'^^ -almost every y G Y^''\ 

A cocycle p satisfying the functional equation 1)5.11) for some map F 
is called a cocycle of type k. 

Idea of the proof. The first step of the proof uses the notion of an iso- 
metric extension as defined by Furstenberg (see |Fu3j ) . Part |i)] of 
Proposition 14.21 implies that the Tt'^'-invariant a-algebra X^^' of X^''^ 
is measurable with respect to W''^^ , where W is some factor of X which 
is an isometric extension of Y. The minimality property |iv)| of the 
same Proposition then gives that X = W, that is, X is an isometric 
extension of Y. 

In particular there exists a compact group U, acting on X by measure 
preserving transformations and inducing the trivial transformation on 
Y. It is then proven that this group of transformations is included in 
the center of Q{X) and in particular is an abelian group; this shows 
that X is an extension of Y by the compact abelian group U. We 
identify X with Y x U. Let p: Y U he the cocycle defining this 
extension. 

Let X be a character of U, that is a continuous group homomorphism 
from U to the circle group S^. Let be the function on X = Y x U 
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defined by (f){y,u) = and $ the function defined on X^''^ = yt^l x 

Uik] by 

$(y,u)= n C^Mxe) = x{ E 

eejO,!}*^ ££{0,1}'' 

for X = (y, u) witli y G F^^l and u G f/'^'l 

As X is of tj^e k, fi/jlk+i 7^ by part |ii)] of Proposition 14. 2[ By 
construction of tfie seminorms tfiis means that the function \l/ = E(<l> | 
T^'^l) is not identically zero. This function is invariant under T^'^^ and 
satisfies \E'(i?„x) = x(u)\E'(x) for every u E U and /i-almost every 
X E X. By using the ergodicity property (Lemma IH.Hj) of /i^'^l it can 
be showed that there exists a function with the same properties and 
everywhere nonzero, and thus a function of modulus 1 with the 
same properties. The invariance of this function gives: 

x{ E (-i)'^W(z/e))=i^x(2^"V)i^x(y)-'- 

eG{0,l}'' 

The existence of a function with this property for every character x 
of U implies the existence of a function F satisfying (|5.1|) by classical 
results about cocycles. □ 

5.3. A technical tool. From this point, the proof of the Structure 
Theorem proceeds by induction on k. We give here a technical tool 
used in the induction. 

For every s in the Kronecker factor Zi we have defined in Subsec- 
tion EiSSl a measure fig on X x X, invariant under T xT. For almost 
every s, the system {X x X, fi2,T x T) is ergodic and we denote it by 
Xs. 

Proposition 5.4. Let the hypotheses and the notation be as in Propo- 
sition Then, for almost every s G Zi, Xs is a system of level k, 
Yg a system of level k ~ 1 and Xs is an extension of Yg by the compact 
abelian group U xU, given by the cocycle {yo, yi) v-^ {pivo) , piui)) which 
is of type k. 

Moreover, Zk-i{Xs) is an extension ofYg by U , given by the cocycle 
iVo, yi) ^ piVo) - piVi) which IS of type k-1. 

This proposition plays the role of a stepladder, allowing us to climb 
from one level to the next one: assume that some properties have been 
shown for systems of order k — 1 and let X be a system of order k. Let 
U and p be as in Proposition 15.31 Then we can use these properties for 
the systems Zk-i{Xs) and this gives information on the group U and 
the cocycle p. This method is used in particular to prove: 
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Lemma 5.5. Let X,Y,U and p be as in Proposition \5.'A Then U is 
connected. 

5.4. Toral systems. Recall that a compact abelian group can be given 
a structure of Lie group if and only if the connected component of 
the its unit element is a finite dimensional torus. This property is 
equivalent to saying that its dual group is finitely generated. It follows 
that every compact (metrizable) abelian group can be represented as an 
inverse limit of a sequence of compact abelian Lie groups. In particular 
every compact connected abelian group can be represented as an inverse 
limit of a sequence of finite dimensional tori. This motivates the next 
definition. 

Definition. Let /c > 1 be an integer and (X, /x, T) be a system of order 
k. We say that this system is toral if its Kronecker factor Zi{X) is a 
compact abelian Lie group and if Zj{X) is an extension of Zj^i{X) by 
a finite dimensional torus for 2 < j < k. 

The structure Theorem can be split into the next two Propositions: 

Proposition 5.6. For every integer k > 1, every system of order k is 
the inverse limit of a sequence of toral systems of order k. 

Proposition 5.7. Every toral system of order k is k-step nilsystem. 

More precisely, Q{X) is a Lie group and acts transitively on X. X 
can be identified with G{X)/A where G{X) is the subgroup of G{X) 
spanned by the connected component of its identity and T. We give 
an idea of the method of the proof. 

Let X be a system of order k; we use the notation of Proposition l5.3l 
As we assume that the result holds for systems of order /c — 1 it holds 
in particular for Y = Zk_i{X). Each element h of G{Y) is lifted to an 
element h of G{X); this means that h is the transformation induced by 
/i on y as in Lemma FS. 11 This is done first for h belonging to G{Y)k^i 
then for h G G{Y)k_i, and so on, following the lower central series of 
G{Y) upwards. Each step uses the functional equation ()5.1|) satisfied 
by the cocycle p. 

Appendix: Further comments 

A.l. The cubic averages. The paper |HK2j also contains the proof 
of convergence of another type of multiple ergodic average, the cubic 
averages, already proven by Bergelson for the cubes of dimension 2: 
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Theorem (Bergelson |B2j ). Let (X, /i, T) be a system and let f, g, h be 
three bounded functions on X. Then the averages 

N-l 



^ ^ f{n)g{n)h{T"^+-x) 

m,n=0 



converge in L'^ifJ^) as N ^ +00. 

For the general case we use here the notation introduced at the top 
of Section El 

Theorem A. 8 (Host & Kra |HK2p . Let {X,ix,T) be a system and 
let fir, e G {0, l}'^ \ {0} be 2^ — 1 bounded functions on X . Then the 
averages 

N-l 

(A-2) E n /^(^"■^-) 



ni,n2,...,ns,=0 £G{0,1}'= 



converge in L'^{fi). 



The strategy for the proof of this theorem is the same as for the 
averages along arithmetic progressions and for the polynomial averages. 
Clearly it suffices to prove the result for ergodic systems. First, a bound 
similar to ()3.8p holds for the cubic averages, with a similar proof using a 
multidimensional version of the van der Corput Lemma. Here {X, /z, T) 
is an ergodic system. 

Proposition. Let fe, s G {0, l}'^ \ {0} be 2^ — 1 functions on X with 
III /ell 00 < 1 for every e. The limsup of the norm in L^{fi) of the aver- 
ages ()A.2|1 is bounded fey min|||/e|||fc. 

e 

By the same arguments as in Subsections 14.3.1) and 14.331 it is there- 
fore possible to restrict to the case that X is a {k — l)-step nilsys- 
tem. Then Theorem lA . 81 follows from a generalization of Theorems 12.11 
and 12. 21 and of Corollarv 12.31 to the case of several commuting transla- 
tions on a nilmanifold. 

The cubic averages are directly linked to the measures fi^'^^: 

Proposition. Let f^, e G {0, l}'^, be 2^ functions on X . Then 

N-l 

W H / n fe{T-^^)dfi{x)^ n feiXs) dfi^'K^) . 

ni,n2,...,nk=0 £G{0,1}'= ££{0,1}'= 

By using this result with all functions equal to 1a and the inequality 

|||lA|||fc > III 1a ill = fJ'{A) we get: 
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Corollary. For every subset A of X 



N-l 



j2 n T-^A)>f,{A) 



ni,n2,...,nk=0 £g{o,l}'! 



The same results hold when the averages on [1, A^)*^ are replaced 
by averages on a sequence of parallelepipeds whose minimal size tend 
to infinity, or more generally by averages on a F0lner sequence. This 
version of the last Corollary has a combinatorial interpretation in terms 
of sets of integer of positive upper density but we do not state it here 



The convergence a.e. of the averages ()A.2|) has been recently proven 



A. 2. Gowers norms and their relations to arithmetic progres- 
sions. 

A. 2.1. The definition. When writing our paper |HK2j we became aware 
of Gowers' paper |Glj where he introduced some norms very similar to 
our seminorms. It turns out that they are identical to our seminorms 
when computed in the particular case that X = Z/NZ endowed with 
the uniform measure and with the transformation x ^ x + 1 mod A^. 
These norms were extensively used by Green and Tao ( |GrTj , jl] ) . We 
recall here their definition, with notation modified in order to fit with 
ours. 

Let A^ > 2 be an integer and let G = Z/A^Z be endowed with its nor- 
malized Haar measure m. Let C{G) denotes the space of complex val- 
ued functions on G. For / G C{G), |||/|||i is defined to be | J f(x) dm(x)\. 
For t G G let ft be the function x i— > f{x + 1) and define by induction 



|||/|||fc can also be defined by a closed formula. For t = {ti,t2, . . . ,tk) G 
G'' and e G {0, l}'^ we write e-t = eiti + ^2^2 + ■ ■ ■ + £fc4- For / G C{G) 
and > 1 we have by induction 



(see IHE21)- 



by Assani f|Asj). 



(A.3) 




(A.4) 




ee{0,l}* 
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An inequality similar to the Caucliy-Schwarz Inequality follows: when 

fe, £ € {0, l}'^, are functions on G, 

(A.5) 

/ n fe{x + e-t)dm{x)dm{ti)dm{t2) . . . dm{tk) < Y\. i/eiU • 

££{0,1}'= ee{0,l}* 

This implies that each map / ^ III / III /c is subadditive and hence a 
seminorm. For k = 2, relation ()A.4|1 gives: 



2 



where / is the Fourier Transform of /, defined on the dual group G = G 
of G. This expression can only be zero when / is the zero function and 
thus III ■ III 2 is a norm on C. By induction and by using definition ()A.3|) . 
it can be checked that for every / G C{G) we have |||/|||fc+i > |||/|||fc for 
every k and thus ||| ■ |||fc is a norm on C{G) for every k >2. 

A.2.2. Using Cowers norms. These norms are used by Gowers, Green 
and Tao to control the integral p.lll appearing in the analytic form of 
Szemeredi's Theorem. 

Proposition A. 9. Let N > 2 be an integer and let Z/iVZ be en- 
dowed with its normalized Haar measure. Let i > 2 be an integer and 
fo, fi, f2, ■ ■ ■ , fi-1 be functions on Z/NZ with \fi\ < 1 for < i < i — 
Then 

(A.6) 

fo{x)fi{x + y)f2{x + 2y) ... /£_i(a; + {i - l)y) dm{x) dm{y) 

< min III/,; III £-1 . 

0<j<^-l 

The starting point of Gowers' method ( |Glj ) can be summarized very 
roughly as follows. Let A be a subset of G, f = 1a and g = f — m{A). 
He distinguishes two cases: If Ills' l^-i is small then the integral in (jl.lj) 
is close to the same integral with the constant m{A) substituted for / 
and thus is large. If Ills' l^-i is large then he shows that the restriction 
of / to some relatively large subset of G has some strong arithmetic 
properties and behaves in some respects like a polynomial; he deduces 
that the integral is large in this case also. 

The way Tao (|Tj) and Green and Tao f |GrTj ) use Gowers' norms 
is much closer to ergodic theory and in particular to the way we use 
the seminorms here. We try here to make this similarity apparent but 
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the reader must be advised that the contents of the next few hnes is 
nothing more than an oversimphfication. 

Tao decomposes any function / on G as a sum of a function g with 
small norm and a "anti-uniform" function h. The contribution of g to 
the integral in (jl.ll) is small and the function h can be viewed as the 
conditional expectation of / relatively to some a-algebra, comparable 
to the factor used in |FuKOj except that it is not invariant under trans- 
lation. The contribution of h to the integral is bounded from below by 
using van der Waerden's Theorem. 

The theorem of Green and Tao ( |GrTj ) about the existence of arith- 
metic progressions in primes seems completely out of the range of er- 
godic theory because the primes have zero density in the integers and 
thus the Correspondence Principle does not apply. However the strat- 
egy is similar. They show that in the preceding decomposition the 
uniform norm of the function h can be bounded independently of any 
uniform bound of the function /, assuming only that this function is 
bounded by a "quasirandom" function. Therefore the analytic form of 
Szemeredi's Theorem can be used to show that the function h gives a 
"large" contribution to the integral in Moreover the function g 

also is bounded by a quasirandom function and its contribution to the 
integral is small, due to an extension of Proposition IA.9I to this case. 
They therefore get a generalization of the analytic form of Szemeredi's 
Theorem under the weaker hypothesis. This result is then used for a 
function closely related to the indicator function of primes. 

The decomposition used in both cases is parallel to the decomposition 

f = E{f\Zk-,) + {f-E{f\Zk-i)) 

used in subsection 14.3.11 of this paper but the authors do not need a 
precise description of the "factor" comparable to the Structure The- 
orem and do not use the machinery of nilpotent groups. It can be 
conjectured that there exists a hidden link between the combinatorial 
constructions and the nilpotent groups and we believe that making this 
link explicit is a very interesting challenge. 
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